Search CORE

145 research outputs found

Enabling Robots to Communicate their Objectives

Author: Abbeel Pieter
Dragan Anca D.
Held David
Huang Sandy H.
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 18/10/2018
Field of study

The overarching goal of this work is to efficiently enable end-users to correctly anticipate a robot's behavior in novel situations. Since a robot's behavior is often a direct result of its underlying objective function, our insight is that end-users need to have an accurate mental model of this objective function in order to understand and predict what the robot will do. While people naturally develop such a mental model over time through observing the robot act, this familiarization process may be lengthy. Our approach reduces this time by having the robot model how people infer objectives from observed behavior, and then it selects those behaviors that are maximally informative. The problem of computing a posterior over objectives from observed behavior is known as Inverse Reinforcement Learning (IRL), and has been applied to robots learning human objectives. We consider the problem where the roles of human and robot are swapped. Our main contribution is to recognize that unlike robots, humans will not be exact in their IRL inference. We thus introduce two factors to define candidate approximate-inference models for human learning in this setting, and analyze them in a user study in the autonomous driving domain. We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations. Our results also suggest, however, that additional research is needed in modeling how humans extrapolate from examples of robot behavior.Comment: RSS 201

arXiv.org e-Print Archive

Crossref

Coherent Soft Imitation Learning

Author: Heess Nicolas
Huang Sandy H.
Watson Joe
Publication venue
Publication date: 29/05/2023
Field of study

Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilities fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches.Comment: 51 pages, 47 figures. DeepMind internship repor

arXiv.org e-Print Archive

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Author: Abdolmaleki Abbas
Bousmalis Konstantinos
Byravan Arunkumar
Gyorgy Andras
Hadsell Raia
Heess Nicolas
Huang Sandy H.
Mishra Shruti
Riedmiller Martin
Shahriari Bobak
Springenberg Jost Tobias
Szepesvari Csaba
TB Dhruva
Vezzani Giulia
Publication venue
Publication date: 15/06/2021
Field of study

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives, or constraints, in the policy optimization step. This includes ideas as far ranging as exploration bonuses, entropy regularization, and regularization toward teachers or data priors when learning from experts or in offline RL. Often, task reward and auxiliary objectives are in conflict with each other and it is therefore natural to treat these examples as instances of multi-objective (MO) optimization problems. We study the principles underlying MORL and introduce a new algorithm, Distillation of a Mixture of Experts (DiME), that is intuitive and scale-invariant under some conditions. We highlight its strengths on standard MO benchmark problems and consider case studies in which we recast offline RL and learning from experts as MO problems. This leads to a natural algorithmic formulation that sheds light on the connection between existing approaches. For offline RL, we use the MO perspective to derive a simple algorithm, that optimizes for the standard RL objective plus a behavioral cloning term. This outperforms state-of-the-art on two established offline RL benchmarks

arXiv.org e-Print Archive

Genome sequencing reveals a splice donor site mutation in the SNX14 gene associated with a novel cerebellar cortical degeneration in the Hungarian Vizsla dog breed

Author: A McKenna
A Tipold
AC Thomas
C Agler
C Chieffo
Cathryn S. Mellersh
CH Vite
Christopher A. Jenkins
CM Ha
D Henke
DP O’Brien
G Gandini
G Urkasemsin
G Urkasemsin
G Urkasemsin
H Li
HS Huang
HS Steinberg
JB Thomas
JF Cummings
Joe Fenn
JR Sandy
K Kyostila
LL Merwe van der
M Kent
Mike Boursnell
N Akizu
N Olby
Oliver P. Forman
OP Forman
P Cingolani
Patrick J. Kenny
R Zeng
RA Thames
Rebecca L. Terry
Rebekkah J. Hitti
Simon L. Priestnall
T Flegel
TS Jokinen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Crossref

Springer - Publisher Connector

CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome

Author: Boutros Paul C.
Bremner Rod
Chan Charles
Der Sandy D.
Farnham Peggy J.
Heisler Lawrence E.
Huang Tim H.-M.
Jurisica Igor
Penn Linda Z.
Takahashi Mark
Torti Dax
Watson John
Winegarden Neil
Woodgett James R.
Yau Patrick
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

An effective tool for the global analysis of both DNA methylation status and protein–chromatin interactions is a microarray constructed with sequences containing regulatory elements. One type of array suited for this purpose takes advantage of the strong association between CpG Islands (CGIs) and gene regulatory regions. We have obtained 20 736 clones from a CGI Library and used these to construct CGI arrays. The utility of this library requires proper annotation and assessment of the clones, including CpG content, genomic origin and proximity to neighboring genes. Alignment of clone sequences to the human genome (UCSC hg17) identified 9595 distinct genomic loci; 64% were defined by a single clone while the remaining 36% were represented by multiple, redundant clones. Approximately 68% of the loci were located near a transcription start site. The distribution of these loci covered all 23 chromosomes, with 63% overlapping a bioinformatically identified CGI. The high representation of genomic CGI in this rich collection of clones supports the utilization of microarrays produced with this library for the study of global epigenetic mechanisms and protein–chromatin interactions. A browsable database is available on-line to facilitate exploration of the CGIs in this library and their association with annotated genes or promoter elements

Crossref

PubMed Central

eScholarship - University of California